TTS-Guided Training for Accent Conversion Without Parallel Data

نویسندگان

چکیده

Accent Conversion (AC) seeks to change the accent of speech from one (source) another (target) while preserving content and speaker identity. However, many existing AC approaches rely on source-target parallel data during training or reference at run-time. We propose a novel conversion framework without need for either speech. Specifically, text-to-speech (TTS) system is first pretrained with target-accented data. This TTS model its hidden representations are expected be associated only target accent. Then, encoder trained convert under supervision model. In doing so, source-accented corresponding transcription forwarded TTS, respectively. The output optimized same as text embedding in system. At run-time, combined decoder toward target. experiments, we converted English two source accents (Chinese/Indian) (American/British/Canadian). Both objective metrics subjective listening tests successfully validate that proposed approach generates samples close high quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical machine translation without long parallel sentences for training data

In this study, we paid attention to the reliability of phrase table. We have been used the phrase table using Och’s method[2]. And this method sometimes generate completely wrong phrase tables. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translat...

متن کامل

Foreign accent conversion in computer assisted pronunciation training

Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Here we propose a voice-transformation technique that can be used to generate the (arguably) ideal voice to imitate: the own voice of the le...

متن کامل

Map-based adaptation for speech conversion using adaptation data selection and non-parallel training

This study presents an approach to GMM-based speech conversion using maximum a posteriori probability (MAP) adaptation. First, a conversion function is trained using a parallel corpus containing the same utterances spoken by both the source and the reference speakers. Then a non-parallel corpus from a new target speaker is used for the adaptation of the conversion function which models the voic...

متن کامل

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences

We extend our recently proposed approach to cross-lingual TTS training to voice conversion, without using parallel training sentences. It employs Speaker Independent, Deep Neural Net (SIDNN) ASR to equalize the difference between source and target speakers and Kullback-Leibler Divergence (KLD) to convert spectral parameters probabilistically in the phonetic space via ASR senone posterior probab...

متن کامل

MLLR-based accent model adaptation without accented data

When the user has an accent different from what the automatic speech recognization system is trained with, the performance of the systems degrades. This is attributed to both acoustic and phonological differences between accents. The phonological differences between two accents are due to different phoneme inventories in two languages. Even for the same phoneme, foreigners and native speakers p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2023

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2023.3270079